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TEXT: 

using A Relational System On wall Street: The Good, The Bad, The ugly, 
And The Ideal conventional wisdom has it that complex decision support is 
an extremely promising application of the relational model. Partly for this 
reason, one of the authors (Rozen) was hired to design the database for a 
decision support system used by wall street investment managers. The 
application, based on the Oracle relational database management system, was 
to replace a previous one written on top of a file system. Problems with 
the previous application system included the inability to enhance its 
functionality to suit new analytical applications, the inability to 
exchange data with other company information, and an antiquated teletype 
user interface. 

The developers were able to reimplement the functionality of the old 
system in seven months, and their success depended largely on the virtues 
of the relational model. However, problems still remained. To illustrate 
both the promise and the problems, in this article, we focus on several 
critical design requirements and the ability of the relational system to 
satisfy them, in all important respects, our analysis is independent of 
Oracle and reflects issues that the use of any relational database 
management system would raise. 

Our view of the relational model is that defined by standard SQL or 
quel. Technically, these are languages equivalent in expressive power to 
the relational calculus augmented with aggregates (and grouping), a small 
set of functions on column domains (e.g., addition, substring), a data 
definition language, update operations, and sorting operators, we assume 
that relational systems support sharing (concurrency control) and 
transaction recovery. 

THE APPLICATION 

The application, called BondDB, is designed to support buying and 
selling bonds by institutional sales people, traders, and risk managers in 
an investment bank, users of BondDB can 

(1) try to find investments for a client that are both less expensive 
than the client's current investments and more profitable as long as 
interest rates remain below some specific value, 

(2) estimate a likely decline in the value of an investment for a 
typical day, based on the recent variability of a bond's price, to try to 
keep a trader from holding too many risky bonds, or 

(3) evaluate the fair value of an investment today for different 
future interest rate scenarios. 

To perform these analyses, BondDB requires information about the 
bonds and their historical behavior. 

BondDB 's financial calculations and user interfaces made it 
impractical to construct the entire system using only a combination of SQL 
and application generation tools provided with the database management 
system. Therefore the application programs are written almost entirely in a 
conventional programming language. 

THE DATA 

BondDB contains the following data: 

(1) basic information on the characteristics of about 10,000 
different bonds, 

(2) time series of daily quotes for about 3,000 of these bonds, 
totaling some 1.8 million points, 

(3) about a dozen different kinds of bonds, and 

(4) information on portfolios of bonds. 
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we draw our examples from the representation of two main entity 
types: bonds and portfolios. Figure 1 is a simplified entity-relationship 
diagram of BondDB' s data- representation requirements. 

Bonds 

A bond is basically a loan from the holder of the bond to the issuer. 
It specifies a sequence of payments of principal and interest according to 
some rules or schedule. Here is a simple example, issuer: U.S. Treasury 
Face value: $1,000.00 Payments: 5/31/88— $38.75 11/30/88— $38.75 
5/31/89— $38.75 11/30/89— $38. 75 +$1,000.00 face value Type of Bond: 
Treasury Note Coupon: 7-3/4% Maturity date: 11/30/89 

in this case there are four semi-annual interest payments of $38.75, 
and the principal is paid back in a lump sum after two years. 

Portfolios 

A portfolio is set of bonds that have been bought in various 
quantities. Each <bond f quantity> pair is called a position. For some 
analyses BondDB also needs to record when the bond was bought and for how 
much. BondDB also includes some information about the portfolio itself, 
such as the name of its owner, and the order in which to display the 
portfolio's positions. 

BondDB AND THE RELATIONAL MODEL--HOW 
GOOD A MATCH? 

In this section, we discuss central data management requirements of 
BondDB and how the developers tried to meet them using the relational 
database management system, we then describe features of an ideal (yet, we 
believe, realizable) system that could provide superior support. 

Management of Bulk Data Types 

one of the virtues of the relational model is that statements of its 
query language operate on entire relations at a time. One can formulate 
set-oriented expressions such as the following, SELECT BOND_lD FROM BOND 

WHERE BOND.ID NOT IN (SELECT BOND_ID FROM PRICE); which prints the IDS of 

all bonds that have no price history (the good). 

a bulk data type is a composite data type whose instances contain a 
dynamically varying, and potentially large, number of elements. Thus sets 
and relations are bulk data types, as are multisets, sequences (lists), 
trees, and graphs. BondDB required usable implementations of bulk data 
types other than relations and the ability to include instances of such 
types as attributes of bonds and portfolios. 

For example, many bonds are characterized by a sequence of time 
intervals and prices indicating when and for how much the issuer can call 
the bond (i.e., prepay the loan to the bond holder). An example is the 
following: issuer: Brooklyn Union Gas Type of Bond: Preferred stock Last 
dividend: $ 2.47 Face value: $25.00 Moody's Rating: a2 SP rating: A 'call 1 
schedule: 8/31/81 to 8/30/86— $27.92 8/31/86 to 8/30/91— $27. 36 8/31/91 to 
8/30/96— $26.81 8/31/96 to 1/01/15 — $26.25 

In this case, the issuer can call the bond at $27.36 per share any 
time between 8/31/86 and 8/30/91. The most common query is to retrieve the 
call information when the other information about this security is fetched. 
A more sophisticated query might want to know the monetary value of the 
calls, which is a function of the bond's current price, price volatility, 
time to maturity, and remaining call schedule. Both queries require 
manipulation of the call SCHEDULE as a sequence. 

Representing the call schedule with multiple attributes, that is, 

BOND( . . ., CALL_DATE . Sub . 1 , CALL_PRICE . Sub . 1 , . . ., CALL_DATE . sub . 1 , 

CALL_PRlCE.sub.l, . . .), would lead to well-known problems. 

One of the problems is that maximum size of a call schedule may not 
be known when the schema is designed, if call. sub. n+1 is added, all code 
using call schedules will need to be modified. Furthermore, variables in 
SQL cannot range over columns. For example, to print just the remaining 
call dates and prices, we would have to write: select cali date . sub . 1 , 

CALL_PRICE.SUb.l FROM BOND WHERE CALI DATE . sub . 1 > TODAY; . . . SELECT 

CALL_DATE.SUb.n, CALL_PRICE . Sub . n FROM BOND WHERE CALi DATE . SUb . n > TODAY; 

Therefore the developers include a separate table for this 
information for all bonds in the database: call(bond_id, calljate, 
CALL_price) ( where bond_ID and CALL_DATE constitute the key. The 
information for the Brooklyn union Gas (bug) example is then stored in two 
tables, bond and call as seen Figure 2. To fetch all information about the 
bond BUG-PS from a relational system into a host language one needs two 
queries (the ugly): SELECT* FROM BOND WHERE BOND_id = 'BUG-PS*; and SELECT* 
FROM CALL WHERE BOND_ID = 'BUG-PS* ORDER BY CALL_DATE; 

Ideally, one would like to express this relationship directly by 
defining an attribute call_SCHEDULE of type sequence (of tuples) on the 

BOND table: CREATE TABLE BOND ( BOND_ID CHAR ( 10) , . . . CALL_SCHEDULE 



21-Apr-06 



2 



12:21 PM 



Ginger R. DeMille 



SEQUENCE ( CALL_DATE DATE, CALL„PRICE FLOAT, )...); 

Then to retrieve this information one would simply write: SELECT* 

FROM BOND; 

This would be followed by operations that can handle sequences as 
standard relational query languages handle relations. 
Procedures as Data 

BondDB's developers tood advantage of the capability of relational 
systems to store queries as views. For example, the following statements 
basically store a procedure to ensure that only the owner of a portfolio 
can modify it. CREATE VIEW UPDATE_PORTFOLIO as select* from all_portfolio 
WHERE ALL.PORTFOLIO. OWNER = USER; GRANT SELECT, UPDATE, INSERT, DELETE ON 
UPDATE_PORTFOLIO TO PUBLIC; 

in this example the stored equity, which takes advantage of the 
set-at-time relational operators, provides a succinct, executable 
specification of the modifi ability constraints on portfolios (the good). 

Some relational systems offer triggers, i.e., statements to Be 
executed whenever some change to the data, such as an insert, occurs. Many 
database management system implementors consider triggers to be an exotic 
feature, but they could have been useful to BondDB's developers. For 
example, in BondDB , to make the most recent price quickly available, it is 
stored with the bond characteristic information in the bond table. Thus an 
insert or update of the PRICE table may actually require modifications to 
two tables, PRICE and bond. BondDB could have used triggers to ensure that 
the prices stored in the BOND table were always up-to-date as shown in 
Figure 3. For a DBMS with triggers, application programs would need to 
modify PRICE only, if the need to maintain the most recent price 
disappeared, or if it became necessary to maintain the most recent five 
prices, there would be no need to modify the application programs, because 
the update action on price would be encapsulated in the database. 

BondDB's need to store procedures went further than triggers and 
views. Some data can be stored succinctly in the form of a rle, but viewed 
by client software as the data generated by the rule (an extension of the 
view concept). For example, in some, but not all, call schedules, a 
sequence of rows can be expressed as a rule, e.g., BUG-PS becomes callable 
on 8/31/81 at $27.92; thereafter the call price declines three times by 
55-2/3 cents at five-year intervals; the call price is rounded to the 
nearest cent. 

Other information can be expressed only procedurally, such as rules 
that determine the coupon of floating rate bonds based on the London 
interbank lending rate and the treasury bill rate. For example, the 
following formula determines one bond's coupon payments: 

A typical query is: For a given future interest rate scenario, and 
during a given time period, what payments will the holder receive? 

in cases such as these it would be best if the database could store 
the procedure but allow client software to view the data as a sequence (the 
ideal), what the developers have to do now is to represent BUG-PS as the 
sequence that host-language applications must manipulate (the ugly). They 
do not represent the lending-rate-dependent data at all, since it depends 
on values that are unknown in advance (the bad). 

Data Abstraction 

initially, BondDB's developers represented financial entities by a 
straightforward transcription of the normalized relational schema into C 
structs. The developers assumed it would be satisfactory to supply 
application programmers with routines which return query results as arrays 
of structs. They also assumed that the programmers could be relied on to 
manipulate the struct representations of database tuples appropriately. 
There were two main flaws with the approach: 

(1) different application programmers tended to write routines with 
overlapping functionality, and 

(2) some kinds of bonds have important information in more than one 
table (e.g., the bond and call tables in Figure 2) because of 
normalization, making it difficult for the progammer to know which tables 
contain information about a given bond. 

Thus this approach caused reliability problems from the beginning. 

Eventually, the developers realized that bonds and portfolios are 
modified and analyzed in certain stereo-typical ways. For example, 
portfolios may be fetched from or written to the database, and manipulated 
or analyzed using mathematical tools. So, to an application, the portfolio 
is a collection of data with some set of operations that can be issued on 
that data. This is the definition of an abstract data type. 

when the developers recognized the abstract data type character of 
entities represented in BondDB, they defined adts in C to represent 
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BondDB's major financial constructs, (e.g., bonds and portfolios). The 
developers use the relational system for its virtues of ensuring recovery 
and concurrency control, associative access, the facility with which they 
can set up and modify relational schemas, and the ease with which they can 
generate query code for tables, in order to operate on an entity, the 
developers must map the date from relations to the c-language structures of 
the adt, manipulate it using c-based abstract data type primitives, and 
then possibly return it to relations. 

ideally, the developers would define the abstract data types directly 
in the database system. Putting adts into the database dovetails well with 
the ideals of the previous sections, because 

(1) BondDB's ADTs have component instances of bulk data types, 

(2) one would like to store the operations of the adt in the database 
as well as the data, and 

(3) sufficiently powerful adt facilities could be used to define new 
bulk data types if necessary. 

Physical Data Independence 

The relational model provides the application program with 
independence from internal data structures in three main ways: 

(1) Queries will have the same result regardless of the physical 
organization of the tables and the availability or unavailability of 
indexes. 

(2) Application programmers need to be concerned with the order of 
columns in tables or with columns that are not referenced since columns are 
referenced by name rather than by offset from the beginning of a record, 
for example. 

(3; Stored procedures in the form of views let the database 
management system present virtual tables, providing an optional additional 
layer of independence. 

BondDB developers relied on all three kinds of physical data 
independence (the good). 

The options for specifying details of physical implementation, 
however, did not always provide enough efficiency for BondDB (the bad). 
Performance considerations forced the developers to implement part of the 
database using operating system facilities, in particular, they installed a 
copy of the bond table in shared memory, thus partly duplicating the 
function of the database management system's cache (the ugly). At that 
point they had to hand-code search structures since a linear search of the 
shared memory consumed more CPU resources than a lookup in the database 
using indexes. 

The developers also were forced to rely on reserved fields to enable 
them to incorporate additional columns to the c struct without 
simultaneously re-compiling and re-linking all application programs. The 
management headaches introduced in maintaining tne copied data were 
substantial. For example, whenever important data needed to be corrected, 
the shared memory copy had to be reinitialized making it unavailable for 
about 15 minutes. Furthermore, there was a period of several hours each day 
when the shared memory copy did not agree with the database copy. 

To say that, ideally, the hardware should have run faster is not 
helpful. At the least, an ideal database management system should provide 
more options for specifying implementation aspects. In this example, being 
able to fix a table in memory would have been helpful. 

Recovery and concurrency Control 

There is one final issue we observed that ties the physical level to 
the abstract level. One of the important steps forward in database systems 
was the decoupling of concurrency control and recovery from the data model 
(relational, network, or hierarchical). This decoupling is not always 
desirable, at least for BondDB. 

Users must be able to modify portfolios when they buy or sell bonds 
in them. Each modification may require several operations. For example, to 
add a bond, a user provides the characteristics of the bond (if it does not 
already exist in the database) and the quantity bought. To make this set of 
operations atomic (i.e., all-or-nothing) in a file system which only 
provides atomic operations on records, the developers would have to order 
the operations carefully and provide adequate redundancy to ensure 
recoverability. For example, when adding a position witn a bond not yet in 
the database, one would first save the bond characteristic information, and 
then the position because without the bond characteristic information, the 
position is meaningless. 

The transaction notion, supported by relational systems, is a big 
improvement over the facilities usually provided by a file system (the 
good). Transaction support automatically provides recoverability at the 
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level of a whole group of operations constituting, for example, the 
addition of a bond to a portfolio. 

implementing such transactions normally requires holding update locks 
until the end of the transaction (to prevent other transactions from 
reading uncommitted updates), in most systems the smallest item that can be 
locked is a page, and BondDB stored several portfolio positions per page, 
unfortunately, there is a small set of actively modified portfolios to 
which most position records are added. This results in records from 
different portfolios interleaved on the same pages being updated 
concurrently. The developers therefore frequently received the complaint 
that a user was locked out of his portfolio when no one else was accessing 
it (the bad). 

To overcome this problem, the developers defined a protocol in which 
processes only select for update (i.e., exclusively lock) the header record 
(the one containing the portfolio ID, the owner, and so on). All other 
portfolio information (hundreds of positions) is selected from the database 
without locking. BondDB then modifies the portfolio in user space. This 
works, but is mcovenient because to avoid unnecessary database updates, 
BondDB must keep track of what data in the portfolio has been modified and 
what has not. Code to keep track of how the portfolio has changed is hard 
to debug and test. Furthermore, there is the danger that an interactive SQL 
user, unaware of the protocol, might accidentally circumvent it. In an 
ideal system the unit of locking would be a portfolio, which is an abstract 
data type, so two users could modify different portfolios concurrently. 

RELATED WORK 

Codd has faulted the SQL language for inadequate handling of null 
values and other design problems, and has faulted SQL implementations for 
permitting duplicate rows. Here we assess the utility of the relational 
model per se, rather than a particular query language. For example, some 
systems may permit duplicate rows partly out of real need for 
multi-relations. Associated difficulties are often due to inconsistent or 
obscure rules for handling the duplicate rows. 

Much criticism of the relational model focuses not on its possible 
shortcomings for traditional business applications, but on either 

(1) its inadequate expressiveness for non-traditional database 
applications, such as design databases, cartographic databases, and 
multi -media databases, or 

(2) its inadequate performance for high transaction applications, 
notwithstanding recent advances in high-transaction systems such as Nonstop 
SQL. 

Sometimes both criticisms play a role, as in Peinl, Reuter, and 
sammer's discussion of a bid-matching system for a stock exchange, in this 
system problems arise because the relational system has inadequate 
knowledge of the semantics of the ADTs represented in it and therefore 
cannot make optimizations necessary for the application's stringent 
performance requirements. 

Nevertheless, some researchers have observed the shortcomings that 
BondDB encountered. For example, Bancilhon* and Maier assert that current 
systems are not satisfactory for business applications because of the 
impedance mismatch between the relational system and the host language and 
then go on to discuss possible improvements. The impedance mismatch is the 
discontinuity in type system and computational model between the database 
management system and the host language; the database management system 
utilizes set-at-a-time expressions with a non-procedural language, while 
the host language offers record- and field-level expressions in an 
imperative language. Indeed the impedance mismatch was one of the factors 
that forced BondDB 1 s developers to manage two representations of each 
financial entity and supply translation code between the two. 

Many workers are constructing systems that might be more 
satisfactory. The design space is large, but the work is promising. 

Stonebraker, citing the success of the relational model, proposes 
evolutionary changes in POSTGRES, which supports user-defined data types 
and complex objects via the storage of procedures as attribute values. 
POSTGRES also allows lower-level access when performance requirements 
demand it. Another approach to extending the relational model, by allowing 
nested relations, is taken by AIM-II and R. sup. 2. D. sup. 2. , which supports, 
in addition, user defined adts for complex objects. 

Many projects (e.g., ENCORE, garden, ins, O.sub.2., ORION, PROTEUS, 
Trellis/Owl) and even products (e.g., OPAL/Gemstone, PROBE, vbase) have 
begun to build object-oriented database management systems whose basic 
persistent objects are adt instances. 

some of the systems above, and others (e.g., starburst) allow the 
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application developer to include additional database access structures. 
Other systems (e.g., EXODUS, GENESIS) provide a set of building blocks from 
which to assemble a database management system tailored to a particular 
application. Finally, some researchers are attempting to erase the 
distinction between the programming language and the database management 
system altogether. Examples of this approach are Galileo, and PS-Algol. 

CONCLUSION 

This article attempted an empirical assessment of the tools BondDB f s 
developers had at their disposal --a relational database management system 
with a conventional programming language. 

We saw that, while relational systems are good, their limitations 
require ugly workarounds, even though realizable improvements would result 
in a cleaner application implementation. 

This was by no means a complete listing of all issues the developers 
faced, some, such as politics, are pervasive but immune to a technological 
solution, others, like the need for rapid development in a competitive 
environment depend on factors other than the data model such as the 
availability of people and machine resources. Application programmers do 
not ask for magic, however, just the best tools that can be efficiently 
implemented. 

Necessary features of an ideal system begin to emerge. It would: 

(1) include a richer selection of built-in bulk data types than 
relations only, and support data abstraction to allow users to introduce 
instances of their own complex data types— our example was call schedule 
sequences and corresponding operations as database values, 

(2) allow procedures written in arbitrary programming languages to be 
stored in the database and viewed either as procedures or as the data that 
they produce at any given time--our example was floating rate coupons, 

(3) offer more tuning options, such as the ability to fix certain 
data in main memory, and 

(4) allow locking of logical objects, such as portfolio instances. 

to summarize our challenge to designers of future database management 
systems: Go ahead, make our database. 
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